Space-Economical Construction of Index Structures for All Suffixes of a String

نویسندگان

  • Shunsuke Inenaga
  • Ayumi Shinohara
  • Masayuki Takeda
  • Hideo Bannai
  • Setsuo Arikawa
چکیده

The minimum all-suffixes directed acyclic word graph (MASDAWG) of a string w has |w| + 1 initial nodes, where the dag induced by all reachable nodes from the k-th initial node conforms with the DAWG of the k-th suffix of w. A new space-economical algorithm for the construction of MASDAWG(w) is presented. The algorithm reads a given string w from right to left, and constructs MASDAWG(w) without suffix links. It performs in time linear in the output size. Furthermore, we introduce the minimum all-suffixes compact DAWG (MASCDAWG). CDAWGs are known to be more space-economical than DAWGs, and thus MASCDAWG(w) requires smaller space than MASDAWG(w). We present an on-line (right-to-left) algorithm to build MASCDAWG(w) without suffix links, whose running time is also linear in its size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suffix Array of Alignment: A Practical Index for Similar Data

The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretica...

متن کامل

Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Given a string S of n symbols, a longest common extension query LCE(i, j) asks for the length of the longest common prefix of the ith and jth suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42–50, 2014, Proc. CPM 2015:65–76) described several data structures for answeri...

متن کامل

Suffix Tree

SYNONYMS Compact suffix trie DEFINITION The suffix tree S(y) of a non-empty string y of length n is a compact trie representing all the suffixes of the string. The suffix tree of y is defined by the following properties: All branches of S(y) are labeled by all suffixes of y. • • Edges of S(y) are labeled by strings. • Internal nodes of S(y) have at least two children. • Edges outgoing an intern...

متن کامل

Space Efficient Linear Time Construction of Suffix Arrays

We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and t...

متن کامل

Efficient de novo assembly of large genomes using compressed data structures - Supplemental Materials and Methods

The suffix array is a compact representation of the lexicographic ordering of the suffixes of a text [1]. Each element of the array is an index into the original string; SAX [i] = j indicates that the suffix starting at position j in T is the i-th lowest suffix in X. As an example consider the string T = AGATCGATA$. The suffix array of T is SAT = [10, 9, 1, 7, 3, 5, 6, 2, 8, 4]. As the suffix a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002